PART-A

1. Import and Understand the data [12 Marks]

A. Extract ‘plant-seedlings-classification.zip’ into new folder (unzipped) using python. [2 Marks]

Hint: You can extract it Manually by losing 2 marks

plant-seedlings-classification.zip is extracted as shown above.

B. Map the images from train folder with train labels to form a DataFrame. [6 Marks]

Hint: Create a DataFrame with 3 columns: Name of image, Species/class/type of image & actual image..

The pandas dataframe is created with 3 columns:
img_name - Name of image
img_class - Species/class/type of image
img - actual image

C. Write a function that will select n random images and display images along with its species. [4 Marks]

Hint: If input for function is 5, it should print 5 random images along with its labels.

2. Data preprocessing [8 Marks]

A. Create X & Y from the DataFrame. [2 Marks]

B. Encode labels of the images. [2 Marks]

C. Unify shape of all the images. [2 Marks]

The images have different dimensions..

The images are reshaped to (80,80,3)

D. Normalise all the images. [2 Marks]

3. Model training [10 Marks]

Checkpoint: Please make sure if shape of X is (No.of images, height, width, No. Of channels). If not, you need to correct it otherwise it will be issue during model training.

A. Split the data into train and test data. [2 Marks]

B. Create new CNN architecture to train the model. [4 Marks]

C. Train the model on train data and validate on test data. [2 Marks]

The model has produced 99.50% Train accuracy and 85.68% Test accuracy It has 2% Train loss and and 54% Test loss. The model looks promising to predict the plan seedlings classification.

D. Select a random image and print actual label and predicted label for the same. [2 Marks]

A random image /plant-seedlings-classification/train/Maize/1b1ab91eb.png was selected under the class Maize. The model has correctly identified the picture as Maize. This concludes the model is good enough to classify the plan seedlings.

PART-B

1. Import and Understand the data [5 Marks]

A. Import and read oxflower17 dataset from tflearn and split into X and Y while loading. [2 Marks]

Hint: It can be imported from tflearn.datasets. If tflearn is not installed, install it. It can be loaded using: x, y = oxflower17.load_data()

The dataset is loaded and split into x and y

B. Print Number of images and shape of the images. [1 Marks]

C. Print count of each class from y. [2 Marks]

A. Display 5 random images. [1 Marks]

B. Select any image from the dataset and assign it to a variable. [1 Marks]

C. Transform the image into grayscale format and display the same. [3 Marks]

D. Apply a filter to sharpen the image and display the image before and after sharpening. [2 Marks]

E. Apply a filter to blur the image and display the image before and after blur. [2 Marks]

F. Display all the 4 images from above questions besides each other to observe the difference. [1 Marks]

The differences are visible among original image, gray image, blurred image and sharp image

3. Model training and Tuning: [15 Marks]

A. Split the data into train and test with 80:20 proportion. [2 Marks]

B. Train a model using any Supervised Learning algorithm and share performance metrics on test data. [3 Marks]

Insights on test data prediction:

Precision: Out of all predicted values, what fraction are predicted correctly
Recall(sensitivity or TPR): Out of all actual values how much fraction we identified correctly

Score accuracy for test data using logistic regression model is 37%

C. Train a model using Neural Network and share performance metrics on test data. [4 Marks]

The trained Neural Network model has Test accuracy of 39.33%. This is better than Supervised learning - logistic regression model.

D. Train a model using a basic CNN and share performance metrics on test data. [4 Marks]

As we can see in the above graphs, the model has achieved 69.48% of Validation accuracy and it i stays the same after around 50 epochs, meaning no further learning. The training set has acheived 100% accuracy after some 25 epochs. For designed architecture, the model has been optimized. This model is much better than the previous Supervised learning-Logistic regression modle and Neural Network classifier models.

E. Predict the class/label of image ‘Prediction.jpg’ using best performing model and share predicted label. [2 Marks]

The test image was a white flower and the predicted class also represents white flower family. The model's validation accuracy was ~70%. With this model, we were able to predict the closer family the test image belongs to.

THE END